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Abstract —This paper presents a new approach to intra-cell 
pilot contamination in crowded massive MIMO scenarios. The 
approach relies on two essential properties of a massive MIMO 
system, namely near-orthogonality between user channels and 
near-stability of channel powers. Signal processing techniques 
that take advantage of these properties allow us to view a set 
of contaminated pilot signals as a graph code on which iterative 
belief propagation can be performed. This makes it possible to 
decontaminate pilot signals and increase the throughput of the 
system. The proposed solution exhibits high performance with 
large improvements over the conventional method. The improve¬ 
ments come at the price of an increased error rate, although this 
effect is shown to decrease significantly for increasing number of 
antennas at the base station. 

I. Introduction 

Multiple-input multiple-output (MIMO) has been identified 
as a key technology to improve spectral efficiency of wireless 
communication systems and is finding its way into practical 
systems, like LTE and LTE-Advanced. The research in MIMO 
has recently took a turn, when the advantage of having a 
massive number of antennas at a base station (BS) was asserted 
in CJ. In □, a massive MIMO system refers to a multi-cell 
multi-user system with a massive number of antennas at the 
BS that serves multiple users. The number of users is much 
smaller than the number of BS antennas, defining an under¬ 
determined multi-user system with a massive number of extra 
spatial degrees of freedom (DoF). Exploiting those extra DoF 
and assuming an infinite number of antennas at the BS, the 
multi-user MIMO channel can be turned into an orthogonal 
channel and the effect of small-scale fading and thermal 
noise can be eliminated. Based on those excellent properties, 
massive MIMO is acknowledged as a promising technology 
for very high system throughput and energy efficiency 121 . 

When the number of antennas becomes massive, acquiring 
the channel state information (CSI) becomes a severe bottle¬ 
neck. Downlink channel training requires a training length that 
is proportional to the number of antennas at the BS and is thus 
impractical. One solution promoted in (U restricts massive 
MIMO operations to time-division duplex (TDD) for which 
channel reciprocity is exploited. As the downlink and uplink 
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channels are equal, CSI is acquired at the BS based on uplink 
training and then used for downlink transmission. The benefit 
is that the training length is proportional to the number of 
users, which is much smaller than the number of BS antennas. 

As described in (TJ, CSI is acquired using orthogonal pilot 
sequences, but, due to the shortage of orthogonal sequences, 
the same pilot sequences must be reused in neighboring cells, 
causing pilot contamination. This problem is considered as 
one of the major challenges in massive MIMO systems O. 
Mitigation of pilot contamination has been the focus of several 
works recently. These include where it is utilized that 
the desired and interfering signals can be distinguished in the 
channel covariance matrices, as long as the angle-of-arrival 
spreads of desired and interfering signals do not overlap. 
A pilot sequence coordination scheme is proposed to help 
satisfying this condition. The work in II utilizes coordination 
among base stations to share downlink messages. Each BS 
then performs linear combinations of messages intended for 
users applying the same pilot sequence. This is shown to elim¬ 
inate interference when the number of base station antennas 
goes to infinity. A multi-cell precoding technique is used in m 
with the objective of not only minimizing the mean squared 
error of the signals within the cell, but also minimizing the 
interference imposed to other cells. 

The survey of the related work indicates that the pilot 
contamination problem has been seen as an inter-cell problem 
that arises when the users associated with two neighboring 
cells use the same pilot sequence. An implicit assumption 
associated with it is that the pilot sequences of the users 
associated with the same cell are perfectly scheduled, such that 
no intra-cell pilot contamination occurs. These assumptions 
fall apart when one considers very dense, crowded scenarios 
as envisioned in 5G wireless scenarios ID In such a setting, 
orthogonal scheduling of the users belonging to the same BS 
becomes infeasible, due to scheduling overhead. 

In this work, we consider such a crowd scenario, where the 
amount of users and their access behavior make it infeasible 
to schedule the transmissions. Instead users choose pilot 
sequences at random in an uncoordinated manner from a small 
pool shared by all users. Since the users are not coordinated, 
the pilot contamination problem can be cast as a random 
access problem. We identify two features specific to massive 
MIMO: (1) asymptotic orthogonality between user channels; 
and (2) asymptotic invariance of the power received from a 
user over a short time interval. We use these features in order 
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Fig. 1. A single cell crowd scenario. 
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Fig. 2. An example of a transmission schedule. 
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Fig. 3. An example of a pilot schedule. 


to formulate a pilot access protocol using the framework of 
coded random access ill9i. In such a framework, knowledge 
about the pilot applied by each individual user is not necessary 
a priori. This will be discussed in more detail in section [Till 

The difference from existing approaches for coded random 
access is that the proposed protocol combines decoding of 
the data in the uplink with estimation of the channel, which 
can be used for downlink transmission. Moreover, the massive 
MIMO property of a stable norm makes it possible to apply the 
protocol in fading channels, which is not possible with existing 
approaches to coded random access. Overall, the solution 
proposed in this paper is a radical departure from the usual 
treatment of the pilot contamination problem and introduces 
an important link to the area of random access protocols. 

II. System Model 

In this work we denote scalars in lower case, vectors in 
bold lower case and matrices in bold upper case. A superscript 
“T” denotes the transpose and a superscript “ilf” denotes the 
conjugate transpose. 

We consider a random access system consisting of a single 
base station with M antennas and K users with a single 
antenna, where M and K are in the hundreds or thousands, 
see Fig. [T] Communication is performed on a time slotted 
basis, where each time slot consists of four phases; an uplink 
pilot phase, an uplink data phase, a downlink pilot phase and a 
downlink data phase, see Fig.[2l The channel between the k'th 
user and the base station in the n’th time slot is denoted hnk = 
[ft.„fe(l) hnkiX). ■. hnkiM)]^, where hnkii) ~ CA/'(0,1) V i. 
It is assumed that hnk V k are mutually orthogonal, which 
is justified by the range of M. Moreover, it is assumed that 
channel coefficients in different time slots are i.i.d, while the 
channel power, = H^n/clP remains constant within a 

period of f3 time slots. Note that the channel power varies due 
to path loss and shadowing effects, which causes it to vary 
much slower than the channel coefficients. 

In each time slot, each user is active with probability 
Pa. If a user is active, a random pilot sequence, Sk = 


[s/c(l) Sk{2) ... Sk{r)]^, is chosen among a set of size r with 
mutually orthogonal pilot sequences. Note that multiple users 
may choose the same pilot sequence. See Fig.[3]for an example 
of a random pilot schedule with r = 2 and K = 3. By An, we 
denote all active users in time slot n and by we denote the 
set of users applying Sj in the n’th time slot. If Y'^ denotes 
the uplink pilot signal received in time slot n, we have 

^r = E E hnksj+zi), (1) 

3=^ keAi 

where is a matrix of i.i.d. Gaussian noise components, 
hence r\j CA/’(0,cr^) V i, j. Any future instances of 

a vector z or matrix Z, with different sub- or superscripts 
follow the same definition. All active users transmit a message 
of length L in the uplink data phase. The message from the 
k'th user is denoted = [x^(l) ^^(2)... . Denoting 

the received uplink signal in time slot n as we then have 

Y^Kkxf+ Zl. (2) 

kEiAn 

In the downlink phase we rely on channel reciprocity, such that 
the uplink channel estimate is assumed to be a valid estimate of 
the downlink channel. The base station transmits a precoded 
downlink pilot sequence, such that the k'th user receives a 
downlink pilot signal, given by 

+ (3) 

where Wnk = [■w„fc(l) is the pre- 

coding vector for user k in the n’th time slot. The base 
station is able to schedule the downlink messages, xf = 
xf{2 )... xf{L)] , such that the received signal in the 
downlink data phase is 














































































































III. Pilot Access Protocol 

This section describes the proposed method of communi¬ 
cation in the system described in section HIl The main focus 
of this work is the uplink phase, however, a subsection is 
dedicated to describing the operation in the downlink phase. 


A. Uplink 

From the uplink pilot signals in ([T]), it is possible to estimate 
the channels between the users and the base station. However, 
since multiple users may apply the same pilot sequence, it is 
only possible to estimate a sum of the involved channels. The 
least squares estimate, based on the pilot signal in time 
slot n from users applying Sj is found as 

= {{sfs.r^sfvr^f 

= hnk + 
keAi 


where is the impairment of the estimate caused by the 
noise, z^. Any future instances of a vector z with a prime 
follow the same definition. 

The problem of interfering users applying the same, or a 
non-orthogonal, pilot sequence is often called pilot contami¬ 
nation. If we proceed to detect the data in the uplink phase 
using a contaminated channel estimate, the result will be a 
summation of data messages. If 'ijjnj is the data estimate based 
on the channel estimate (l)nj, we have 
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( 6 ) 


Hence, a pilot collision leads to a data collision. In our system, 
one way to deal with this problem is to carefully select pa, 
such that the probability of having one and only one user 
applying a particular pilot sequence in a particular time slot 
is maximized. Hence, we have 


maximize Pr(|^:^| = 1) 

Pa 

subject to 0 < Pa < 1 (7) 


This will maximize the number of non-contaminated channel 
estimates, and in turn maximize the number of successful 
data transmissions. This approach is reminiscent of the framed 
slotted ALOHA protocol for conventional random access. We 
consider this a reference scheme in this work and refer to 
it as ALOHA. Note that a random access, i.e. nonscheduled, 
scheme must be considered as a reference, due to the assump¬ 
tion of a crowded scenario, where scheduling is infeasible. 

ALOHA has been state-of-the-art for many years within ran¬ 
dom access protocols, but recently a paradigm shift has started 
with the advent of coded random access MM- In this work, we 
view the problem of pilot contamination as a random access 
problem and apply newly developed tools in this area to solve 
the problem. Two features from the massive MIMO scenario 
are essential to our solution; near-orthogonality between user 


channels and near-stability of channel powers. Through signal 
processing techniques they allow us to resolve pilot collisions 
and thereby utilize otherwise wasted resources. The solution 
can be viewed as a two-stage processing approach: 

1) Matched filter: The received uplink pilot and data 
signals, in o and are processed using matched 
filters, which are constructed from the contaminated 
estimates in ©. More specifically, we multiply the re¬ 
ceived signals with creating filtered signals, denoted 
fnj and Qnj for data and pilots respectively. These 
signals contain linear combinations of the data and pilots 
transmitted by the users contributing to the contaminated 
estimate, see ([8]) and ([9]). The relationship between 
the variables we wish to estimate and the filtered signals 
can be viewed as a factor graph, see Fig. [4| and Fig. [5l 

2) Successive interference cancellation (SIC): The coef¬ 
ficients of the linear combinations in ^ and Q are the 
two-norms, ||fen/c|P, of the involved channels. In a mas¬ 
sive MIMO system, these can be assumed slowly fading, 
contrary to the fast fading channel coefficients. Hence, 
successive interference cancellation can be applied on 
the filtered signals in order to reduce the linear combina¬ 
tions to data signals from individual users. This requires 
knowledge about the edges in the code graphs, i.e. what 
pilots have been applied by the individual users and in 
which time slots. This information is not available a 
priori at the base station. However, it can be embedded in 
the uplink data messages, such that when a data message 
has been recovered, the base station is informed about 
the pilot pattern chosen by the user. In practice, this 
could be realized by embedding the seed for a pseudo 
random number generator. Note, that graph knowledge 
is not necessary to initiate SIC, since a data message 
can be recovered immediately when one and only one 
user chose a particular pilot in a particular time slot. This 
provides the necessary graph information to proceed SIC 
using belief propagation. The overhead resulting from 
embedding graph information is considered negligible. 

= (8) 

keAi 

9nj = 

= X] W^rikW^Sj A (9) 

keAi 

The purpose of the matched filters is to transform the received 
signals from linear combinations with fast fading coefficients 
(the channel coefficients) into linear combinations with slowly 
fading coefficients (the norms). Note that the signals only 
remain linear combinations, when the channels are orthogonal, 
and that the coefficients are slowly fading only when the norms 
are stable. Both are fulfilled under the conditions given by a 
massive MIMO scenario. We can thus see the filtered signals, 
fnj and Qnj V j and n = l,2,...,/3, as a code on which 
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Fig. 4. A graph representation of pilot collisions. 
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Fig. 5. A graph representation of data collisions. 


iterative belief propagation can be performed. See Fig. [ 6 ] for a 
graph showing the inter-dependencies between fnj and Qnj • 
Example: Consider the simple example already introduced 
in Fig. [3] We assume /3 = 2, such that the resulting graphs 
after matched filtering are found in Fig. [4] and Fig. [5] Note, 
that since the norms are assumed constant, we can omit the 
time index, such that ||/i/c|P = ||^n/c|P V n. We then have 
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Fig. 6. A graph representation of data and pilot collisions. 


We introduce the variable c which accounts for accumulated 
noise components and estimation errors. Note, that the mag¬ 
nitude of the elements in c increases as processing progresses. 
This will be discussed in further detail in section hyi 

Initially, we isolate the contribution from user 1, giving us 

||fei||^a:i+c = /ii-/ 21 . ( 11 ) 

Since the applied pilot sequence is known a priori by the base 
station, we can find the norm as 

||/ll||^ + c= (511 -fif 2 l)- ( 12 ) 

Finally, the estimate of the message from user 1, , is 

(511 -fl 2 l))“h/ll -/ 2 l)- (13) 

Similar operations can be performed for finding X 2 and X 3 . 

B. Downlink 

In downlink we assume channel reciprocity, such that the 
user does not need to estimate each coefficient of hnk^ which 
would require a pilot signal for all M antennas. Instead, we 
let the receiver estimate the concatenated “channel” consisting 
of both the downlink precoder, Wnk, and the actual channel. 
Denoting the concatenated channel, qnk, we have 

Qnk — ^nk'^nki (14) 

where qnk is estimated through (O. 

In order to choose an appropriate precoder, the base station 
must have an estimate of the current channel. The coded 
operation applied in uplink does not guarantee that such an 
estimate is available. Uplink operation relies on SIC based 
only on knowledge of the norm. Hence, downlink transmission 
to a user is only possible if that user avoided collision during 
the previous uplink pilot phase, such that an uncontaminated 
channel estimate is available. This incurs a delay in downlink 
transmissions, whose magnitude is analyzed in section IIII-CI 

C. Analysis 

The performances of the reference scheme and the proposed 
scheme are tightly connected with the factor node degree 
distribution of the code graph. Here a factor degree, denoted as 
dnj , refers to the number of users occupying the same resource 




























block, i.e. applies the j’th pilot sequence in the n’th time slot. 
A user is active and applying pilot sequence j with probability 
Pa/T, such that the degree probability distribution is 

( 15 ) 

For the ALOHA scheme, we found that the optimal perfor¬ 
mance is achieved when the probability of having dnj = 1 is 
maximized. Differentiating Pr{dnj = 1) with respect to Pa 
and finding the roots satisfying our conditions, we get that 
Pa = ^ maximizes the performance of the ALOHA scheme. 

We can not use the same approach for optimizing the 
proposed scheme, since resource blocks with dnj > 1 may 
be useful. Instead we must seek a well performing degree dis¬ 
tribution which favors the iterative belief propagation. Several 
works have studied this, e.g. in Goiinj, however, in this work 
we can not freely tailor the degree distribution. We are limited 
to the binomial distribution as expressed in ([TS]) . with only 
the freedom to choose a proper Pa. Similar limitations were 
considered in ii with focus on choosing an average degree, 
d, which was optimized numerically. In our context, we have 


A numerical optimization of d and thereby in turn pa for a 
specific pair of K and r will be performed in section [IVl 
As described in section IIII-Bl downlink transmissions expe¬ 
rience a delay due to lack of channel knowledge. We denote 
the delay for user k, Ak. This delay is equal to the number of 
time slots until user k is active and avoids a collision during 
the uplink pilot phase. Denoting the probability of a user being 
active and avoiding collision, p*, we have 

pl=Pa(l-^Y~\ (17) 

The probability distribution of A/, follows the negative bino¬ 
mial and is therefore given by 

Pr{A,=5)=p:{l-p:y-\ (18) 

The expected value, E[A/j;], of the delay is then found as 

E[Afe] = (19) 

Pa 

There is a natural tradeoff between optimizing pa for high 
uplink throughput and optimizing it for limiting the delay in 
the downlink phase. Such a joint optimization is outside the 
scope of this work. In the numerical evaluations in section HVl 
we will solely be concerned with the uplink throughput. 

IV. Numerical Results 

The proposed scheme is simulated and compared to framed 
slotted ALOHA in terms of uplink throughput and block 
error rate. Framed slotted ALOHA does not utilize SIC, but 
optimizes performance through a maximization of degree one 
nodes in the code graph, see d?]). The proposed scheme is based 
on an assumption that the channel coefficients in different time 
slots are i.i.d, while their two-norms remain constant within 


a period of p time slots. In the numerical evaluations, we 
will challenge these assumptions by simulating with fading 
channels. A rich scattering environment is assumed, such that 
hnki'm) can be modeled using Clarke’s model ifT^ . hence 

1 

hnk{m) = -y= ( 20 ) 

V i = l 

where Ns is the number of scatterers, fd is the maximum 
Doppler shift, ai and c^i is the angle of arrival and initial 
phase, respectively, of the wave from the i’th scatterer. Both 
ai and 4>i are i.i.d. in the interval [—tt, tt) and fd = where 
V is the speed of the user, c is the speed of light and fc is the 
carrier frequency. An overview of the simulation parameters 
is given in Table HI Note that (3 = X.lKjT is chosen in order 
to ensure a 20% surplus of resource blocks relative to AT, 
such that the iterative belief propagation performs well. All 
simulation results are averages over 10, 000 iterations. 

Initially, in Fig. [7] we present results for the normalized 
throughput of the proposed scheme as a function of the average 
degree of a resource block, which is directly related to the 
activation probability, as seen in (O. We define normalized 
throughput as the total number of successfully decoded mes¬ 
sages divided by i.e. the total amount of resource blocks. It 
is evident that an average degree of approximately 2.5 should 
be aimed for in the considered range of AT, which is confirmed 
by the results from 0. All other simulations are performed 
using an average degree of 2.5 regardless of AT. Note that 
improved performance could be achieved by optimizing the 
activation probability to a particular value of K. 

Fig. [8] shows normalized goodput, i.e. throughput with 
erroneous messages discarded, as a function of the number 
of users accessing the base station. The proposed scheme 
clearly outperforms the conventional method of framed slotted 
ALOHA. The improvement increases with AT, since the pro¬ 
posed scheme benefits from a larger number of messages to 
code across. An increase in K can be viewed as an increase 
in the block length, which improves coding efficiency. 

The coding gain comes at the price of an increased error 
rate. As mentioned in section nni whenever SIC is performed, 
noise and estimation errors are accumulated, which may lead 
to errors. At higher AT, it is more common to see high degrees 
in the code graph, even if the average degree remains constant. 
Moreover, SIC is performed across a larger time span, which 
leads to larger errors in the norm estimation. As a result, we 
experience an increased error rate for increasing K, which 
is illustrated in Fig. [9l It also shows that the error rate drops 
significantly, as the number of base station antennas increases. 
The reason is that the norm stabilizes for increasing M, 
making the assumption of a constant norm increasingly valid. 

V. Conclusions 

We presented a solution for the pilot contamination prob¬ 
lem in crowded scenarios, where users within a single cell 
must share a small set of pilot sequences. We view intra¬ 
cell pilot contamination as a random access problem and 







TABLE I 

Simulation parameters 


Parameter 

Value 

Description 

fc 

1.8 GHz 

Carrier frequency 

V 

3 km/h 

User mobility 

Ns 

20 

Number of scatterers 


0.1 

Relative noise power 

r 

5 bits 

Length and number of pilot sequences 

ts 

0.01 s 

Length of a time slot 

L 

1000 bits 

Length of uplink data messages 

IS 

1.2K/T 

Number of time slots 



Fig. 7. Throughput as a function of the average degree of a resource block. 

draw on newly developed ideas from this area of research. 
The massive MIMO setting provides two essential properties; 
near-orthogonality between user channels and near-stability 
of channel powers. These properties allow us to view a set 
of contaminated pilot signals as a graph code on which 
iterative belief propagation can be performed. The proposed 
solution proves highly efficient, comfortably outperforming 
the conventional ALOHA approach to random access. The 
price to pay is an increased error rate, due to accumulation of 
estimation errors in the belief propagation algorithm. However, 
this downside is shown to significantly diminish as the number 
of base station antennas increases. 
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